Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters, delivers the best performance for their size, and even offers competitive alternatives to models that are 2-3 times bigger.
The new AlphaFold model demonstrates substantially improved accuracy over many previous specialized tools: far greater accuracy for protein–ligand interactions compared with state-of-the-art docking tools, much higher accuracy for protein–nucleic acid interactions compared with nucleic-acid-specific predictors and substantially higher antibody–antigen prediction accuracy.
This work introduces Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model that vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks and provides a model fine-tuned to follow instructions, Mixtral 8x7B - Instruct, that surpasses GPT-3.5 Turbo, Claude-2.1, Gemini Pro, and Llama 2 70B - chat model on human benchmarks.
This work improves existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales and presents a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens.
This System Card provides a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures the authors've implemented to ensure the model is safe and aligned.
OpenVLA, a 7B-parameter open-source VLA trained on a diverse collection of 970k real-world robot demonstrations, is introduced and it is shown that it can effectively fine-tune OpenVLA for new settings, with especially strong generalization results in multi-task environments involving multiple objects and strong language grounding abilities.
The introduction is organized in a unique didactic manner developed by the authors, starting from more simple concepts such as linear programming and single-point methods, and advancing from these to more difficult concepts such as optimality conditions for nonlinear optimization and set-oriented solution algorithms.
This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models, and presents comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development.
A novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks.
Recent improvements to Job Dispatcher are overviews, including its brand new website and documentation, enhanced visualisations, improved job management, and a rising trend of user reliance on the service from low- and middle-income regions.
Two below-threshold surface code memories on Willow, a distance-7 code and a distance-5 code integrated with a real-time decoder, indicate device performance that, if scaled, could realize the operational requirements of large-scale fault-tolerant quantum algorithms.
The development of TRIPOD+AI is described and the expanded 27 item checklist with more detailed explanation of each reporting recommendation is presented, and the TRIPOD+AI for Abstracts checklist is presented.
The SCARE 2025 guideline provides an up-to-date framework for surgical case reports in the era of AI and adds specific reporting criteria for AI to ensure that any use of artificial intelligence in a case report is clearly documented, explained and discussed including with respect to bias and ethics.
OLMo is built, a competitive, truly Open Language Model, to enable the scientific study of language models and it is hoped this release will empower the open research community and inspire a new wave of innovation.
The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2, a large model that significantly outperforms other models of comparable size and makes the model weights available under an OpenRAIL license.
This work introduces Tulu 3, a family of fully-open state-of-the-art post-trained models, alongside its data, code, and training recipes, serving as a comprehensive guide for modern post-training techniques.
The RewardBench dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety, to benchmark how reward models perform on challenging, structured and out-of-distribution queries and presents many findings on propensity for refusals, reasoning limitations, and instruction following shortcomings of various reward models towards a better understanding of the RLHF process.
PaliGemma is an open Vision-Language Model that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model that achieves strong performance on a wide variety of open-world tasks.
GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas, document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society.
Molmo is presented, a new family of VLMs that are state-of-the-art in their class of openness, with a novel, highly detailed image caption dataset collected entirely from human annotators using speech-based descriptions.
This work facilitates low precision KV cache quantization by incorporating several novel methods, including per-Channel Key Quantization, and develops custom CUDA kernels for KVQuant, which enables serving LLaMA-7B with a context length of up to 1 million on a single A100-80GB GPU and up to 10 million on an 8-GPU system.
The Cosmos World Foundation Model Platform is presented to help developers build customized world models for their Physical AI setups and position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications.
This second iteration of SigLIP 2 introduces SigLIP 2, a family of new multilingual vision-language encoders that build on the success of the original SigLIP, and extends the original image-text training objective with several prior, independently developed techniques into a unified recipe.
Code development continues in line with the Galaxy Project roadmap, with improvements to job scheduling and the user interface, and general purpose graphical processing units (GPGPU) access for cutting-edge methods, and licensed tool support.
An extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%, which underscores the need for further advancements in this area.
To facilitate scientific research on language model pretraining, Dolma is curate and released, a three-trillion-token English corpus built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials.
This work introduces Aya, a massively multilingual generative language model that follows instructions in 101 languages of which over 50% are considered as lower-resourced, and introduces extensive new evaluation suites that broaden the state of the art for multilingual eval across 99 languages.
With an improved framework for model development and evaluation, a large language model is shown to provide answers to medical questions that are comparable or preferred with respect to those provided by human physicians.
The results show the significant potential of AI in personalizing learning, automating routine tasks, and providing access to knowledge, but also reveal serious risks of exacerbating social inequality and ethical dilemmas.
A new measure of firm-level AI investments is proposed, using a unique combination of worker resume and job postings datasets, which reveals a stark increase in AI investments across sectors.
A simple approach to joint named entity recognition and relation extraction is presented and how pretrained large language models can be fine-tuned to extract useful records of complex scientific knowledge is demonstrated.
Compared to current editing models that exhibit degradation in character consistency and stability across multiple turns, it is observed that FLUX.1 Kontext improved preservation of objects and characters, leading to greater robustness in iterative workflows.
The model, called CUT3R (Continuous Updating Transformer for 3D Reconstruction), captures rich priors of real-world scenes: not only can it predict accurate pointmaps from image observations, but it can also infer unseen regions of the scene by probing at virtual, unobserved views.
Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations, is introduced and will stimulate the community to help multimodal LLMs catch up with human-level visual perception.
Virchow is presented, the largest foundation model for computational pathology to date, and it is demonstrated that a large foundation model enables pan-cancer detection, achieving 0.95 specimen-level area under the receiver operating characteristic curve across nine common and seven rare cancers.
The performance of the state-of-the-art dismantling techniques are compared, highlighting their optimal range of applicability for practical problems, and grounded approaches to design robustness, identify early-warning signals and devise adaptive responses are compared.
The status of InterPro is reported on, detailing new developments in the database, associated web interface and software, including the increased integration of structures predicted by AlphaFold and the enhanced description of protein families using artificial intelligence.
HLE is introduced, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage, to inform research and policymaking upon a clear understanding of model capabilities.
The findings substantiate that the PO is a promising and competitive algorithm, surpassing some existing algorithms in the literature, surpassing some existing algorithms in the literature.
NextDenovo is presented, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly and is applied to assemble 35 diverse human genomes from around the world using Nanopore long-read data.
Article Galaxy Pages is a free service from Research Solutions, a company that offers access to content in collaboration with publishing partners, online repositories and discovery services.